Segmentation of DNA into Coding and Noncoding Regions Based on Recursive Entropic Segmentation and Stop-Codon Statistics

نویسندگان

  • Daniel Nicorici
  • Jaakko Astola
چکیده

Heterogeneous DNA sequences can be partitioned into homogeneous domains that are comprised of the four nucleotides A, C, G, and T and the stop codons. Recursively, we apply a new entropic segmentation method on DNA sequences using Jensen-Shannon and Jensen-Rényi divergences in order to find the borders between coding and noncoding DNA regions. We have chosen 12and 18-symbol alphabets that capture (i) the differential nucleotide composition in codons and (ii) the differential stop-codon composition along all the three phases in both strands of the DNA. The new segmentation method is based on the Jensen-Rényi divergence measure, nucleotide statistics, and stop-codon statistics in both DNA strands. The recursive segmentation process requires no prior training on known datasets. Consequently, for three entire genomes of bacteria, we find that the use of nucleotide composition, stop-codon composition, and Jensen-Rényi divergence improve the accuracy of finding the borders between coding and noncoding regions in DNA sequences.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding borders between coding and noncoding DNA regions by an entropic segmentation method.

We present a new computational approach to finding borders between coding and noncoding DNA. This approach has two features: (i) DNA sequences are described by a 12-letter alphabet that captures the differential base composition at each codon position, and (ii) the search for the borders is carried out by means of an entropic segmentation method which uses only the general statistical propertie...

متن کامل

Applications of Recursive Segmentation to the Analysis of DNA Sequences

Recursive segmentation is a procedure that partitions a DNA sequence into domains with a homogeneous composition of the four nucleotides A, C, G and T. This procedure can also be applied to any sequence converted from a DNA sequence, such as to a binary strong(G + C)/weak(A + T) sequence, to a binary sequence indicating the presence or absence of the dinucleotide CpG, or to a sequence indicatin...

متن کامل

Segmentation of short human exons based on spectral features of double curves

This paper presents a new segmentation method based on spectral analysis to locate borders between short protein coding regions and non-coding regions. We formulate the innovative double curve representation of a DNA sequence and apply local three-codon measurement on the discrete Fourier spectral features at 1/3 frequency to identify short protein coding regions. The proposed spectral segmenta...

متن کامل

Divergence Measures for Dna Segmentation

Entropy-based divergence measures have shown promising results in many areas of engineering and image processing. In this study, we use the Jensen-Shannon and Jensen-Rényi divergence measures for DNA segmentation. Based on these information theoretic measures and protein shape coded in DNA, we propose a new approach to the problem of finding the borders between coding and noncoding DNA regions....

متن کامل

A Pixon-based Image Segmentation Method Considering Textural Characteristics of Image

Image segmentation is an essential and critical process in image processing and pattern recognition. In this paper we proposed a textured-based method to segment an input image into regions. In our method an entropy-based textured map of image is extracted, followed by an histogram equalization step to discriminate different regions. Then with the aim of eliminating unnecessary details and achi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • EURASIP J. Adv. Sig. Proc.

دوره 2004  شماره 

صفحات  -

تاریخ انتشار 2004